{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 基于 x86 CPU 的神经网络自动调度\n",
        "**原作者**: [Lianmin Zheng](https://github.com/merrymercy), [Chengfan Jia](https://github.com/jcf94/)\n",
        "\n",
        "针对特定设备和工作负载的自动调优对于获得最佳性能至关重要。下面是关于如何用自动调度器调优 x86 CPU 的整个神经网络的教程。\n",
        "\n",
        "为了自动调优神经网络，需要将网络划分为小的子图，并独立地调优它们。每个子图被视为一个搜索任务。任务调度程序对时间进行切片，并动态地为这些任务分配时间资源。任务调度器预测每个任务对端到端执行时间的影响，并优先考虑能够最大程度减少执行时间的任务。\n",
        "\n",
        "对于每个子图，使用 `tvm/python/topi` 中的 compute 声明来获得张量表达式形式的计算 DAG。然后，使用自动调度器来构造 DAG 的搜索空间，并搜索良好的调度（低级优化）。\n",
        "\n",
        "与基于模板的 [autotvm](../tune_with_autotvm/index) 依赖手动模板定义搜索空间不同，自动调度程序不需要任何调度模板。换句话说，自动调度器只使用 `tvm/python/topi` 中的 compute，而不使用现有的调度模板。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "\n",
        "import tvm\n",
        "from tvm import relay, auto_scheduler\n",
        "from tvm.relay import data_dep_optimization as ddo\n",
        "import tvm.relay.testing\n",
        "from tvm.contrib import graph_executor"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 定义网络\n",
        "\n",
        "首先，需要用 relay 前端 AP I定义网络。可以从 {mod}`tvm.relay.testing` 加载一些预定义的网络。还可以从 MXNet、ONNX、PyTorch 和 TensorFlow 加载模型（参见[前端教程](../compile_models/index)）。\n",
        "\n",
        "对于卷积神经网络，尽管自动调度器可以在任何布局下正确工作，但我们发现 NHWC 布局通常能获得最佳性能。我们还通过自动调度器实现了对 NHWC 布局的更多优化。因此，建议将模型转换为 NHWC 布局以使用自动调度器。可以使用 [ConvertLayout pass](convert-layout-usage) 在 TVM 中进行布局转换。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def get_network(name, batch_size, layout=\"NHWC\", dtype=\"float32\", use_sparse=False):\n",
        "    \"\"\"获取网络的符号定义和随机权值\"\"\"\n",
        "\n",
        "    # 自动调度首选 NHWC 布局\n",
        "    if layout == \"NHWC\":\n",
        "        image_shape = (224, 224, 3)\n",
        "    elif layout == \"NCHW\":\n",
        "        image_shape = (3, 224, 224)\n",
        "    else:\n",
        "        raise ValueError(\"无效布局: \" + layout)\n",
        "\n",
        "    input_shape = (batch_size,) + image_shape\n",
        "    output_shape = (batch_size, 1000)\n",
        "\n",
        "    if name.startswith(\"resnet-\"):\n",
        "        n_layer = int(name.split(\"-\")[1])\n",
        "        mod, params = relay.testing.resnet.get_workload(\n",
        "            num_layers=n_layer,\n",
        "            batch_size=batch_size,\n",
        "            layout=layout,\n",
        "            dtype=dtype,\n",
        "            image_shape=image_shape,\n",
        "        )\n",
        "    elif name.startswith(\"resnet3d-\"):\n",
        "        n_layer = int(name.split(\"-\")[1])\n",
        "        mod, params = relay.testing.resnet.get_workload(\n",
        "            num_layers=n_layer,\n",
        "            batch_size=batch_size,\n",
        "            layout=layout,\n",
        "            dtype=dtype,\n",
        "            image_shape=image_shape,\n",
        "        )\n",
        "    elif name == \"mobilenet\":\n",
        "        mod, params = relay.testing.mobilenet.get_workload(\n",
        "            batch_size=batch_size, layout=layout, dtype=dtype, image_shape=image_shape\n",
        "        )\n",
        "    elif name == \"squeezenet_v1.1\":\n",
        "        assert layout == \"NCHW\", \"squeezenet_v1.1 only supports NCHW layout\"\n",
        "        mod, params = relay.testing.squeezenet.get_workload(\n",
        "            version=\"1.1\",\n",
        "            batch_size=batch_size,\n",
        "            dtype=dtype,\n",
        "            image_shape=image_shape,\n",
        "        )\n",
        "    elif name == \"inception_v3\":\n",
        "        input_shape = (batch_size, 3, 299, 299) if layout == \"NCHW\" else (batch_size, 299, 299, 3)\n",
        "        mod, params = relay.testing.inception_v3.get_workload(batch_size=batch_size, dtype=dtype)\n",
        "    elif name == \"mxnet\":\n",
        "        # an example for mxnet model\n",
        "        from mxnet.gluon.model_zoo.vision import get_model\n",
        "\n",
        "        assert layout == \"NCHW\"\n",
        "\n",
        "        block = get_model(\"resnet50_v1\", pretrained=True)\n",
        "        mod, params = relay.frontend.from_mxnet(block, shape={\"data\": input_shape}, dtype=dtype)\n",
        "        net = mod[\"main\"]\n",
        "        net = relay.Function(\n",
        "            net.params, relay.nn.softmax(net.body), None, net.type_params, net.attrs\n",
        "        )\n",
        "        mod = tvm.IRModule.from_expr(net)\n",
        "    elif name == \"mlp\":\n",
        "        mod, params = relay.testing.mlp.get_workload(\n",
        "            batch_size=batch_size, dtype=dtype, image_shape=image_shape, num_classes=1000\n",
        "        )\n",
        "    else:\n",
        "        raise ValueError(\"Network not found.\")\n",
        "\n",
        "    if use_sparse:\n",
        "        from tvm.topi.sparse.utils import convert_model_dense_to_sparse\n",
        "\n",
        "        mod, params = convert_model_dense_to_sparse(mod, params, bs_r=4, random_params=True)\n",
        "\n",
        "    return mod, params, input_shape, output_shape\n",
        "\n",
        "\n",
        "# 定义神经网络和编译目标。\n",
        "# 如果目标机器支持 avx512 指令，将 \"llvm -mcpu=core-avx2\" 替换为 \"llvm -mcpu=skylake-avx512\"\n",
        "network = \"resnet-50\"\n",
        "use_sparse = False\n",
        "batch_size = 1\n",
        "layout = \"NHWC\"\n",
        "target = tvm.target.Target(\"llvm -mcpu=core-avx2\")\n",
        "dtype = \"float32\"\n",
        "log_file = f\"{network}-{layout}-B{batch_size:d}-{target.kind.name}.json\""
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 提取搜索任务\n",
        "\n",
        "接下来，从网络中提取搜索任务及其权重。任务的权重是该任务的子图在整个网络中出现的次数。通过使用权重，可以将网络的端到端延迟近似为 `sum(latency[t] * weight[t])`，其中 `latency[t]` 是任务的延迟，`weight[t]` 是任务的权重。任务调度器会优化这个目标。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "获取模型...\n",
            "提取任务...\n",
            "========== Task 0  (workload key: [\"2d10de6646307f0e3e5cf4b31c20e69b\", [1, 56, 56, 64], [1, 1, 64, 256], [1, 56, 56, 256]]) ==========\n",
            "p0 = PLACEHOLDER [1, 56, 56, 64]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 64, 256]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "\n",
            "========== Task 1  (workload key: [\"3060808fc5c74e18b1276729071fbae0\", [1, 14, 14, 256], [1, 1, 256, 1024], [1, 14, 14, 1024], [1, 14, 14, 1024]]) ==========\n",
            "p0 = PLACEHOLDER [1, 14, 14, 256]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 256, 1024]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 14, 14, 1024]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, ax1, ax2, ax3])\n",
            "\n",
            "========== Task 2  (workload key: [\"76afb7bf408a1ffa0b8b7bc09d077dc3\", [1, 14, 14, 256], [1, 1, 256, 1024], [1, 14, 14, 1024], [1, 1, 1, 1024], [1, 14, 14, 1024]]) ==========\n",
            "p0 = PLACEHOLDER [1, 14, 14, 256]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 256, 1024]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 14, 14, 1024]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, ax1, ax2, ax3])\n",
            "p3 = PLACEHOLDER [1, 1, 1, 1024]\n",
            "T_add(ax0, ax1, ax2, ax3) = (T_add[ax0, ax1, ax2, ax3] + p3[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 3  (workload key: [\"2beb39e9afe4c74822fffbcbb8533595\", [1, 14, 14, 1024], [1, 1, 1024, 512], [1, 1, 1, 512], [1, 7, 7, 512]]) ==========\n",
            "p0 = PLACEHOLDER [1, 14, 14, 1024]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 1024, 512]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 512]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 4  (workload key: [\"0fad1b42d0d33418e0a8d15d3bbad3c9\", [1, 56, 56, 256], [1, 1, 256, 512], [1, 28, 28, 512]]) ==========\n",
            "p0 = PLACEHOLDER [1, 56, 56, 256]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 256, 512]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*p1[ry, rx, rc, ff])\n",
            "\n",
            "========== Task 5  (workload key: [\"38552500208b25b4035682b0e93cbce3\", [1, 14, 14, 256], [6, 6, 256, 256], [1, 1, 1, 256], [1, 14, 14, 256]]) ==========\n",
            "p0 = PLACEHOLDER [1, 14, 14, 256]\n",
            "data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 15)) && (i2 >= 1)) && (i2 < 15)), p0[i0, (i1 - 1), (i2 - 1), i3], 0f)\n",
            "input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 16), ((floormod(floordiv(p, 4), 4)*4) + eps), ((floormod(p, 4)*4) + nu), ci]\n",
            "B(i, j) = select(((floormod(i, 6) == 5) && (floormod(j, 6) == 5)), 1f, select(((floormod(i, 6) == 5) && (floormod(j, 6) == 4)),  ..(OMITTED)..  (floormod(j, 6) == 1)), 0f, select(((floormod(i, 6) == 0) && (floormod(j, 6) == 0)), 1f, 0f))))))))))))))))))))))))))))))))))))\n",
            "data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])\n",
            "p1 = PLACEHOLDER [6, 6, 256, 256]\n",
            "bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*p1[eps, nu, co, ci])\n",
            "A(i, j) = select(((floormod(i, 6) == 5) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 6) == 5) && (floormod(j, 4) == 2)),  ..(OMITTED)..  6) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 6) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))))))))))\n",
            "inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])\n",
            "conv2d_winograd(n, h, w, co) = inverse[floormod(h, 4), floormod(w, 4), ((((n*4)*4) + (floordiv(h, 4)*4)) + floordiv(w, 4)), co]\n",
            "p2 = PLACEHOLDER [1, 1, 1, 256]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 6  (workload key: [\"6d628209072e3e3dd8f49359935acea6\", [1, 14, 14, 1024], [1, 1, 1024, 256], [1, 1, 1, 256], [1, 14, 14, 256]]) ==========\n",
            "p0 = PLACEHOLDER [1, 14, 14, 1024]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 1024, 256]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 256]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 7  (workload key: [\"cfd09cf1ca9e943f0ee12a18813a5c75\", [1, 28, 28, 128], [6, 6, 128, 128], [1, 1, 1, 128], [1, 28, 28, 128]]) ==========\n",
            "p0 = PLACEHOLDER [1, 28, 28, 128]\n",
            "data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 29)) && (i2 >= 1)) && (i2 < 29)), p0[i0, (i1 - 1), (i2 - 1), i3], 0f)\n",
            "input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 49), ((floormod(floordiv(p, 7), 7)*4) + eps), ((floormod(p, 7)*4) + nu), ci]\n",
            "B(i, j) = select(((floormod(i, 6) == 5) && (floormod(j, 6) == 5)), 1f, select(((floormod(i, 6) == 5) && (floormod(j, 6) == 4)),  ..(OMITTED)..  (floormod(j, 6) == 1)), 0f, select(((floormod(i, 6) == 0) && (floormod(j, 6) == 0)), 1f, 0f))))))))))))))))))))))))))))))))))))\n",
            "data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])\n",
            "p1 = PLACEHOLDER [6, 6, 128, 128]\n",
            "bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*p1[eps, nu, co, ci])\n",
            "A(i, j) = select(((floormod(i, 6) == 5) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 6) == 5) && (floormod(j, 4) == 2)),  ..(OMITTED)..  6) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 6) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))))))))))\n",
            "inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])\n",
            "conv2d_winograd(n, h, w, co) = inverse[floormod(h, 4), floormod(w, 4), ((((n*7)*7) + (floordiv(h, 4)*7)) + floordiv(w, 4)), co]\n",
            "p2 = PLACEHOLDER [1, 1, 1, 128]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 8  (workload key: [\"3060808fc5c74e18b1276729071fbae0\", [1, 56, 56, 64], [1, 1, 64, 256], [1, 56, 56, 256], [1, 56, 56, 256]]) ==========\n",
            "p0 = PLACEHOLDER [1, 56, 56, 64]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 64, 256]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 56, 56, 256]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, ax1, ax2, ax3])\n",
            "\n",
            "========== Task 9  (workload key: [\"3060808fc5c74e18b1276729071fbae0\", [1, 28, 28, 128], [1, 1, 128, 512], [1, 28, 28, 512], [1, 28, 28, 512]]) ==========\n",
            "p0 = PLACEHOLDER [1, 28, 28, 128]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 128, 512]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 28, 28, 512]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, ax1, ax2, ax3])\n",
            "\n",
            "========== Task 10  (workload key: [\"7d79c516e212fe1d73f5dbb90eaca2cf\", [1, 1000], [1, 1000]]) ==========\n",
            "p0 = PLACEHOLDER [1, 1000]\n",
            "T_softmax_maxelem(i0) max= p0[i0, k]\n",
            "T_softmax_exp(i0, i1) = tir.exp((p0[i0, i1] - T_softmax_maxelem[i0]))\n",
            "T_softmax_expsum(i0) += T_softmax_exp[i0, k]\n",
            "T_softmax_norm(i0, i1) = (T_softmax_exp[i0, i1]/T_softmax_expsum[i0])\n",
            "\n",
            "========== Task 11  (workload key: [\"8c53ca2904398da2889aa7508082d7bb\", [1, 7, 7, 2048], [1, 1, 1, 2048]]) ==========\n",
            "p0 = PLACEHOLDER [1, 7, 7, 2048]\n",
            "adaptive_pool_sum(ax0, ax1, ax2, ax3) += p0[ax0, ((ax1*7) + rv0), ((ax2*7) + rv1), ax3]\n",
            "adaptive_pool_avg(ax0, ax1, ax2, ax3) = (adaptive_pool_sum[ax0, ax1, ax2, ax3]/(float32((select((bool)1, ((ax1 + 1)*7), (((ax1 + 1)*7) + 1)) - (ax1*7)))*float32((select((bool)1, ((ax2 + 1)*7), (((ax2 + 1)*7) + 1)) - (ax2*7)))))\n",
            "\n",
            "========== Task 12  (workload key: [\"3060808fc5c74e18b1276729071fbae0\", [1, 7, 7, 512], [1, 1, 512, 2048], [1, 7, 7, 2048], [1, 7, 7, 2048]]) ==========\n",
            "p0 = PLACEHOLDER [1, 7, 7, 512]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 512, 2048]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 7, 7, 2048]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, ax1, ax2, ax3])\n",
            "\n",
            "========== Task 13  (workload key: [\"6d628209072e3e3dd8f49359935acea6\", [1, 28, 28, 512], [1, 1, 512, 128], [1, 1, 1, 128], [1, 28, 28, 128]]) ==========\n",
            "p0 = PLACEHOLDER [1, 28, 28, 512]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 512, 128]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 128]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 14  (workload key: [\"2beb39e9afe4c74822fffbcbb8533595\", [1, 28, 28, 512], [1, 1, 512, 256], [1, 1, 1, 256], [1, 14, 14, 256]]) ==========\n",
            "p0 = PLACEHOLDER [1, 28, 28, 512]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 512, 256]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 256]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 15  (workload key: [\"76afb7bf408a1ffa0b8b7bc09d077dc3\", [1, 56, 56, 64], [1, 1, 64, 256], [1, 56, 56, 256], [1, 1, 1, 256], [1, 56, 56, 256]]) ==========\n",
            "p0 = PLACEHOLDER [1, 56, 56, 64]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 64, 256]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 56, 56, 256]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, ax1, ax2, ax3])\n",
            "p3 = PLACEHOLDER [1, 1, 1, 256]\n",
            "T_add(ax0, ax1, ax2, ax3) = (T_add[ax0, ax1, ax2, ax3] + p3[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 16  (workload key: [\"6d628209072e3e3dd8f49359935acea6\", [1, 56, 56, 256], [1, 1, 256, 64], [1, 1, 1, 64], [1, 56, 56, 64]]) ==========\n",
            "p0 = PLACEHOLDER [1, 56, 56, 256]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 256, 64]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 64]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 17  (workload key: [\"6d628209072e3e3dd8f49359935acea6\", [1, 7, 7, 2048], [1, 1, 2048, 512], [1, 1, 1, 512], [1, 7, 7, 512]]) ==========\n",
            "p0 = PLACEHOLDER [1, 7, 7, 2048]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 2048, 512]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 512]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 18  (workload key: [\"0fad1b42d0d33418e0a8d15d3bbad3c9\", [1, 28, 28, 512], [1, 1, 512, 1024], [1, 14, 14, 1024]]) ==========\n",
            "p0 = PLACEHOLDER [1, 28, 28, 512]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 512, 1024]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*p1[ry, rx, rc, ff])\n",
            "\n",
            "========== Task 19  (workload key: [\"76afb7bf408a1ffa0b8b7bc09d077dc3\", [1, 28, 28, 128], [1, 1, 128, 512], [1, 28, 28, 512], [1, 1, 1, 512], [1, 28, 28, 512]]) ==========\n",
            "p0 = PLACEHOLDER [1, 28, 28, 128]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 128, 512]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 28, 28, 512]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, ax1, ax2, ax3])\n",
            "p3 = PLACEHOLDER [1, 1, 1, 512]\n",
            "T_add(ax0, ax1, ax2, ax3) = (T_add[ax0, ax1, ax2, ax3] + p3[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 20  (workload key: [\"d37380659057397544e056461ea3bad3\", [1, 56, 56, 64], [3, 3, 64, 64], [1, 1, 1, 64], [1, 56, 56, 64]]) ==========\n",
            "p0 = PLACEHOLDER [1, 56, 56, 64]\n",
            "pad_temp(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 57)) && (i2 >= 1)) && (i2 < 57)), p0[i0, (i1 - 1), (i2 - 1), i3], 0f)\n",
            "p1 = PLACEHOLDER [3, 3, 64, 64]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 64]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 21  (workload key: [\"d37380659057397544e056461ea3bad3\", [1, 7, 7, 512], [3, 3, 512, 512], [1, 1, 1, 512], [1, 7, 7, 512]]) ==========\n",
            "p0 = PLACEHOLDER [1, 7, 7, 512]\n",
            "pad_temp(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 8)) && (i2 >= 1)) && (i2 < 8)), p0[i0, (i1 - 1), (i2 - 1), i3], 0f)\n",
            "p1 = PLACEHOLDER [3, 3, 512, 512]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 512]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 22  (workload key: [\"2beb39e9afe4c74822fffbcbb8533595\", [1, 56, 56, 256], [1, 1, 256, 128], [1, 1, 1, 128], [1, 28, 28, 128]]) ==========\n",
            "p0 = PLACEHOLDER [1, 56, 56, 256]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 256, 128]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 128]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 23  (workload key: [\"f07e228ef5f642b386d23a62df615e7b\", [1, 7, 7, 512], [1, 1, 512, 2048], [1, 7, 7, 2048], [1, 1, 1, 2048], [1, 1, 1, 2048], [1, 7, 7, 2048]]) ==========\n",
            "p0 = PLACEHOLDER [1, 7, 7, 512]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 512, 2048]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 7, 7, 2048]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, ax1, ax2, ax3])\n",
            "p3 = PLACEHOLDER [1, 1, 1, 2048]\n",
            "T_multiply(ax0, ax1, ax2, ax3) = (T_add[ax0, ax1, ax2, ax3]*p3[ax0, 0, 0, ax3])\n",
            "p4 = PLACEHOLDER [1, 1, 1, 2048]\n",
            "T_add(ax0, ax1, ax2, ax3) = (T_multiply[ax0, ax1, ax2, ax3] + p4[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 24  (workload key: [\"07f9fcad27bdd3233f86fe35a5185d33\", [1, 224, 224, 3], [7, 7, 3, 64], [1, 1, 1, 64], [1, 112, 112, 64]]) ==========\n",
            "p0 = PLACEHOLDER [1, 224, 224, 3]\n",
            "pad_temp(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 3) && (i1 < 227)) && (i2 >= 3)) && (i2 < 227)), p0[i0, (i1 - 3), (i2 - 3), i3], 0f)\n",
            "p1 = PLACEHOLDER [7, 7, 3, 64]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 64]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 25  (workload key: [\"0fad1b42d0d33418e0a8d15d3bbad3c9\", [1, 14, 14, 1024], [1, 1, 1024, 2048], [1, 7, 7, 2048]]) ==========\n",
            "p0 = PLACEHOLDER [1, 14, 14, 1024]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 1024, 2048]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*p1[ry, rx, rc, ff])\n",
            "\n",
            "========== Task 26  (workload key: [\"00a059b856ac30ac172b6252254479a6\", [1, 2048], [1000, 2048], [1, 1000], [1, 1000]]) ==========\n",
            "p0 = PLACEHOLDER [1, 2048]\n",
            "p1 = PLACEHOLDER [1000, 2048]\n",
            "T_matmul_NT(i, j) += (p0[i, k]*p1[j, k])\n",
            "p2 = PLACEHOLDER [1, 1000]\n",
            "T_add(ax0, ax1) = (T_matmul_NT[ax0, ax1] + p2[ax0, ax1])\n",
            "\n",
            "========== Task 27  (workload key: [\"6d012ba18a086c11ee2b85c7324e16f2\", [1, 112, 112, 64], [1, 1, 1, 64], [1, 56, 56, 64]]) ==========\n",
            "p0 = PLACEHOLDER [1, 112, 112, 64]\n",
            "pad_temp(ax0, ax1, ax2, ax3) = tir.if_then_else(((((ax1 >= 1) && (ax1 < 113)) && (ax2 >= 1)) && (ax2 < 113)), p0[ax0, (ax1 - 1), (ax2 - 1), ax3], -3.40282e+38f)\n",
            "pool_max(ax0, ax1, ax2, ax3) max= pad_temp[ax0, ((ax1*2) + rv0), ((ax2*2) + rv1), ax3]\n",
            "p1 = PLACEHOLDER [1, 1, 1, 64]\n",
            "T_add(ax0, ax1, ax2, ax3) = (pool_max[ax0, ax1, ax2, ax3] + p1[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n",
            "========== Task 28  (workload key: [\"6d628209072e3e3dd8f49359935acea6\", [1, 56, 56, 64], [1, 1, 64, 64], [1, 1, 1, 64], [1, 56, 56, 64]]) ==========\n",
            "p0 = PLACEHOLDER [1, 56, 56, 64]\n",
            "pad_temp(i0, i1, i2, i3) = p0[i0, i1, i2, i3]\n",
            "p1 = PLACEHOLDER [1, 1, 64, 64]\n",
            "conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*p1[ry, rx, rc, ff])\n",
            "p2 = PLACEHOLDER [1, 1, 1, 64]\n",
            "T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + p2[ax0, 0, 0, ax3])\n",
            "T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)\n",
            "\n"
          ]
        }
      ],
      "source": [
        "# 从网络中提取任务\n",
        "print(\"获取模型...\")\n",
        "mod, params, input_shape, output_shape = get_network(\n",
        "    network,\n",
        "    batch_size,\n",
        "    layout,\n",
        "    dtype=dtype,\n",
        "    use_sparse=use_sparse,\n",
        ")\n",
        "print(\"提取任务...\")\n",
        "tasks, task_weights = auto_scheduler.extract_tasks(mod[\"main\"], params, target)\n",
        "\n",
        "for idx, task in enumerate(tasks):\n",
        "    print(\"========== Task %d  (workload key: %s) ==========\" % (idx, task.workload_key))\n",
        "    print(task.compute_dag)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 开始调优\n",
        "\n",
        "现在，设置了一些调优选项并启动搜索任务\n",
        "\n",
        "- `num_measure_trials` 是在调优期间可以使用的度量试验的数量。您可以将它设置为小的数字（例如，200）以进行快速演示运行。在实践中，建议将它设置在 `800 * len(tasks)` 左右，这通常足以让搜索收敛。例如 resnet-50 中有 29 个任务，所以可以将其设置为 20000。可以根据自己的时间预算调整该参数。\n",
        "- 此外，使用 `RecordToFile` 将测量记录转储到日志文件中，测量记录可以用于查询历史，恢复搜索，并在以后进行更多的分析。\n",
        "- 查阅 {mod}`tvm.auto_scheduler.TuningOptions`、{mod}`tvm.auto_scheduler.LocalRunner` 获取更多参数。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def run_tuning():\n",
        "    print(\"开始调优...\")\n",
        "    tuner = auto_scheduler.TaskScheduler(tasks, task_weights)\n",
        "    tune_option = auto_scheduler.TuningOptions(\n",
        "        num_measure_trials=200,  # 将其更改为 20000 以实现最佳性能\n",
        "        runner=auto_scheduler.LocalRunner(repeat=10, enable_cpu_cache_flush=True),\n",
        "        measure_callbacks=[auto_scheduler.RecordToFile(log_file)],\n",
        "    )\n",
        "\n",
        "    if use_sparse:\n",
        "        from tvm.topi.sparse.utils import sparse_sketch_rules\n",
        "\n",
        "        search_policy = [\n",
        "            auto_scheduler.SketchPolicy(\n",
        "                task,\n",
        "                program_cost_model=auto_scheduler.XGBModel(),\n",
        "                init_search_callbacks=sparse_sketch_rules(),\n",
        "            )\n",
        "            for task in tasks\n",
        "        ]\n",
        "\n",
        "        tuner.tune(tune_option, search_policy=search_policy)\n",
        "    else:\n",
        "        tuner.tune(tune_option)\n",
        "\n",
        "run_tuning()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "````{admonition} 解释在调优期间的打印信息\n",
        ":class: alert alert-info\n",
        "在调优过程中，控制台上将打印大量信息。它们用于调试目的。最重要的信息是任务调度器的输出。下表是示例输出。\n",
        "\n",
        "```c\n",
        "----------------------------------------------------------------------\n",
        "------------------------------  [ Task Scheduler ]\n",
        "----------------------------------------------------------------------\n",
        "|  ID  | Latency (ms) | Speed (GFLOPS) | Trials |\n",
        "-------------------------------------------------\n",
        "|    0 |        0.010 |           0.40 |     64 |\n",
        "|    1 |        0.087 |          47.19 |     64 |\n",
        "|    2 |        0.008 |          -0.00 |     64 |\n",
        "|    3 |        0.177 |         582.07 |     64 |\n",
        "|    4 |        0.268 |         862.37 |    256 |\n",
        "|    5 |        0.166 |         621.13 |    128 |\n",
        "|    6 |        0.170 |         605.10 |    128 |\n",
        "|    7 |        0.128 |         403.20 |     64 |\n",
        "|    8 |        0.189 |         545.71 |     64 |\n",
        "|    9 |        0.231 |        1001.01 |    448 |\n",
        "|   10 |        0.155 |         664.80 |    256 |\n",
        "|   11 |        0.155 |         662.86 |    256 |\n",
        "|   12 |        0.119 |         434.08 |     64 |\n",
        "|   13 |        0.199 |         522.13 |     64 |\n",
        "|   14 |        0.235 |         986.56 |    320 |\n",
        "|   15 |        0.149 |         689.13 |    128 |\n",
        "|   16 |        0.155 |         664.80 |    192 |\n",
        "|   17 |        0.151 |         340.64 |     64 |\n",
        "|   18 |        0.176 |         597.55 |    128 |\n",
        "|   19 |        0.220 |        1054.37 |    192 |\n",
        "|   20 |        0.150 |         686.01 |    128 |\n",
        "|   21 |        0.159 |         650.88 |    128 |\n",
        "|   22 |        0.073 |         358.19 |     64 |\n",
        "|   23 |        0.031 |          70.63 |     64 |\n",
        "|   24 |        0.251 |         947.73 |    128 |\n",
        "|   25 |        0.157 |         652.47 |    128 |\n",
        "|   26 |        0.215 |         954.84 |    128 |\n",
        "|   27 |        0.237 |         868.92 |    128 |\n",
        "|   28 |        0.266 |         774.06 |    128 |\n",
        "-------------------------------------------------\n",
        "Estimated total latency: 10.016 ms      Trials: 3992    Used time : 1131 s      Next ID: 15\n",
        "```\n",
        "\n",
        "该表列出了所有任务的延迟和（估计的）速度。它还列出了所有任务的测量试验分配。最后一行打印这些任务的总加权延迟，这可以粗略估计网络的端到端执行时间。最后一行还输出测试的总数、自动调优所花费的总时间和下一个要调优的任务的 id。\n",
        "\n",
        "也会有一些 \"tvm::Error\" 的错误，因为自动调度程序将尝试一些无效的调度。如果可以继续进行调优，则可以安全地忽略它们，因为这些错误与主进程隔离开来。\n",
        "````"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "```{admonition} 提前终止调优\n",
        ":class: alert alert-info\n",
        "可以通过强制终止此进程提前终止调优。只要为日志文件中的每个任务获得至少一个有效的调度，就应该能够进行编译（见下面的部分）。\n",
        "```"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 编译和评估\n",
        "\n",
        "在自动调优之后，可以用找到的最佳调度来编译网络。在自动调优期间，所有测量记录都被转储到日志文件中，因此可以读取日志文件并加载最佳调度。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "编译...\n",
            "Evaluate inference time cost...\n",
            "Execution time summary:\n",
            " mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  \n",
            "  673.4945     672.7222     675.7006     672.0608      1.5831   \n",
            "               \n"
          ]
        }
      ],
      "source": [
        "# 使用最佳调度编译\n",
        "print(\"编译...\")\n",
        "with auto_scheduler.ApplyHistoryBest(log_file):\n",
        "    with tvm.transform.PassContext(opt_level=3, config={\"relay.backend.use_auto_scheduler\": True}):\n",
        "        lib = relay.build(mod, target=target, params=params)\n",
        "\n",
        "# 创建 graph executor\n",
        "dev = tvm.device(str(target), 0)\n",
        "module = graph_executor.GraphModule(lib[\"default\"](dev))\n",
        "data_tvm = tvm.nd.array((np.random.uniform(size=input_shape)).astype(dtype))\n",
        "module.set_input(\"data\", data_tvm)\n",
        "\n",
        "# Evaluate\n",
        "print(\"评估推理时间成本...\")\n",
        "print(module.benchmark(dev, repeat=3, min_repeat_ms=500))"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 其他技巧\n",
        "\n",
        "1. 在调优过程中，自动调度器需要编译许多程序并从中提取特征。该部分是 CPU 密集型的，因此建议使用多核的高性能 CPU，以加快搜索速度。\n",
        "2. 可以使用 `python3 -m tvm.auto_scheduler.measure_record --mode distill -i log.json` 提取大的日志文件，只保存最好的有用的记录。\n",
        "3. 可以从上一个日志文件恢复搜索。在函数 `run_tuning` 中创建任务调度器时，只需要添加新的参数 `load_log_file`，即 `tuner = auto_scheduler.TaskScheduler(tasks, task_weights, load_log_file=log_file)`。\n",
        "4. 如果有多个目标 CPU，您可以将它们都用于测量，从而使测量并行化。请查阅[如何使用 RPC 跟踪器和 RPC 服务器](tutorials-autotvm-scale-up-rpc-tracker)。要在自动调度器中使用 RPC 跟踪器，请将 `TuningOptions` 中的运行器替换为 {any}`auto_scheduler.RPCRunner`。"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
